Workflow

Single-cell transcriptomics of cells infected with influenza virions carrying barcodes. This experiment allows accurate detection of the number of unique virions infecting each cell and its resulting impact on the transcriptome. The single-cell transcriptomics were performed using 10X Chromium.

The basic steps in the analysis are as follows:

Detailed software versions can be found under Rules.

Results

File Size Description Job properties
fastq10x_qc_analysis.html 336.2 kB

HTML rendering of Jupyter notebook analyzing quality-control statistics from the generation of the 10X FASTQ files using cellranger mkfastq.

Rulefastq10x_qc_analysis
File Size Description Job properties
align_fastq10x_summary.html 398.3 kB

HTML rendering of Jupyter notebook analyzing statistics from the STARsolo alignments of the 10X Illumina FASTQ files.

Rulealign_fastq10x_summary
File Size Description Job properties
wt_virus_pilot_analyze_cell_gene_matrix.html 497.2 kB

HTML rendering of Jupyter notebook analyzing the cell-gene matrix for wt_virus_pilot.

Ruleanalyze_cell_gene_matrix
Wildcardssample10x=wt_virus_pilot
File Size Description Job properties
count_viraltags_fastq10x-wt_virus_pilot.html 375.9 kB

HTML rendering of Jupyter notebook that calls the viral tags for wt_virus_pilot.

Rulecount_viraltags_fastq10x
Wildcardssample10x=wt_virus_pilot
viral_fastq10x_coverage.html 541.8 kB

HTML rendering of Jupyter notebook analyzing the coverage of the viral genes (including viral tags and viral barcodes) in the aligned 10X Illumiona FASTQ reads.

Ruleviral_fastq10x_coverage

Statistics

If the workflow has been executed in cluster/cloud, runtimes include the waiting time in the queue.

Configuration

File Code
config.yaml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# YAML configuration file for the analysis

# max CPUs used by any rules
max_cpus: 16

# file specifying 10X Illumina runs
illumina_runs_10x: data/illumina_runs_10x.csv

# output directories
fastq10x_dir: results/fastq10x  # FASTQ files & QC stats for 10X Illumina runs
mkfastq10x_dir: results/fastq10x/mkfastq_output  # `cellranger mkfastq` output
genome_dir: results/genomes  # location of downloaded genomes and annotations
refgenome: results/genomes/refgenome  # STAR reference genome directory
aligned_fastq10x_dir: results/aligned_fastq10x  # aligned 10X Illumina reads
viral_fastq10x_dir: results/viral_fastq10x  # viral tags / barcodes in 10X reads
analysis_dir: results/analysis  # fine-grained analyses

# cellular genome and GTF ftp sites
cell_genome_ftp: ftp://ftp.ensembl.org/pub/release-98/fasta/canis_familiaris/dna/Canis_familiaris.CanFam3.1.dna.toplevel.fa.gz
cell_gtf_ftp: ftp://ftp.ensembl.org/pub/release-98/gtf/canis_familiaris/Canis_familiaris.CanFam3.1.98.gtf.gz

# viral genome (FASTA), GTF, and Genbank file locations
viral_genome: data/flu_sequences/flu-CA09.fasta
viral_gtf: data/flu_sequences/flu-CA09.gtf
viral_genbank: data/flu_sequences/flu-CA09.gb

# file giving nucleotide identities at viral tag sites
viraltag_identities: data/flu_sequences/flu-CA09_viral_tags.yaml

# STAR alignment parameters. These settings reduce the penalty for
# non-canonical splice sites, which is probably bad for mapping cellular
# reads but is good for mapping viral reads which will have deletions
# not corresponding to splice sites.
scoreGapNoncan: -4
scoreGapGCAG: -4
scoreGapATAC: -4

# URL location of 10X barcode whitelist: **this is for the v3 kit**
cb_whitelist_10x_url: https://github.com/10XGenomics/cellranger/raw/master/lib/python/cellranger/barcodes/3M-february-2018.txt.gz
cb_whitelist_10x: results/aligned_fastq10x/cb_whitelist_10x.txt

cb_len_10x: 16  # length of 10X cell barcode
umi_len_10x: 12  # length of 10X UMI: **this is for the v3 kit**

expect_ncells: 6000  # expected cells per 10X run, for "knee" cell calling

Rules

Rule Jobs Output Singularity Conda environment Code
fastq10x_qc_analysis 1
  • results/fastq10x/fastq10x_qc_analysis.ipynb
  • results/fastq10x/fastq10x_qc_analysis.html
source
align_fastq10x_summary 1
  • results/aligned_fastq10x/align_fastq10x_summary.ipynb
  • results/aligned_fastq10x/align_fastq10x_summary.html
source
viral_fastq10x_coverage 1
  • results/viral_fastq10x/viraltag_locs.csv
  • results/viral_fastq10x/viralbc_locs.csv
  • results/viral_fastq10x/viral_fastq10x_coverage.ipynb
  • results/viral_fastq10x/viral_fastq10x_coverage.html
source
count_viraltags_fastq10x 1
  • results/viral_fastq10x/count_viraltags_fastq10x-wt_virus_pilot.ipynb
  • results/viral_fastq10x/count_viraltags_fastq10x-wt_virus_pilot.html
  • results/viral_fastq10x/wt_virus_pilot_viraltag_counts.csv
source
analyze_cell_gene_matrix 1
  • results/analysis/wt_virus_pilot_analyze_cell_gene_matrix.ipynb
  • results/analysis/wt_virus_pilot_analyze_cell_gene_matrix.html
source
make_fastq10x 1
  • results/fastq10x/wt_virus_pilot-2019-12-03_all_R1.fastq.gz
  • results/fastq10x/wt_virus_pilot-2019-12-03_all_R2.fastq.gz
  • results/fastq10x/mkfastq_output/wt_virus_pilot-2019-12-03
  • results/fastq10x/wt_virus_pilot-2019-12-03_qc_stats.csv
  • _mkfastq_wt_virus_pilot-2019-12-03.csv
  • __wt_virus_pilot-2019-12-03.mro
source
align_fastq10x 1
  • results/aligned_fastq10x/wt_virus_pilot/Solo.out/Gene/Summary.csv
  • results/aligned_fastq10x/wt_virus_pilot/Solo.out/Gene/UMIperCellSorted.txt
  • results/aligned_fastq10x/wt_virus_pilot/Solo.out/Gene/filtered/matrix.mtx
  • results/aligned_fastq10x/wt_virus_pilot/Solo.out/Gene/filtered/features.tsv
  • results/aligned_fastq10x/wt_virus_pilot/Solo.out/Gene/filtered/barcodes.tsv
  • results/aligned_fastq10x/wt_virus_pilot/Aligned.sortedByCoord.out.bam
source
index_bam 1
  • results/aligned_fastq10x/wt_virus_pilot/Aligned.sortedByCoord.out.bam.bai
1
samtools index {input} {output}
get_cb_whitelist_10x 1
  • results/aligned_fastq10x/cb_whitelist_10x.txt
1
2
3
4
5
6
7
        if [[ {params.url} == *.gz ]]
        then
            wget -O - {params.url} | gunzip -c > {output}
        else
            wget -O - {params.url} > {output}
        fi
        
make_refgenome 1
  • results/genomes/cell_and_virus_gtf.gtf
  • results/genomes/refgenome
1
2
3
4
        cat {input.cell_gtf} {input.viral_gtf} > {output.concat_gtf}
        mkdir -p {output.genomeDir}
        STAR --runThreadN {threads}              --runMode genomeGenerate              --genomeDir {output.genomeDir}              --genomeFastaFiles {input.cell_genome} {input.viral_genome}              --sjdbGTFfile {output.concat_gtf}
        
get_cell_genome 1
  • results/genomes/cell_genome.fasta
1
wget -O - {params.ftp} | gunzip -c > {output}
get_cell_gtf 1
  • results/genomes/cell_gtf.gtf
1
wget -O - {params.ftp} | gunzip -c > {output}